Support skipping tracing of selected pure modules #308

tengyifei · 2025-06-13T06:37:12Z

This PR adds a model.pure_modules option to the trainer, which indicates which modules are run with @assume_pure 1.

The primary benefit is making available rich profiles during the backward pass, because jax.vjp preserves framework scopes in the backward pass.

This PR updates PyTorch/XLA pin to Jun 17 because it relies on pytorch/xla#9360.

Regular profile

python3 torchprime/torch_xla_models/train.py \
    model/sharding=llama-fsdp-tp ici_mesh.fsdp=8 \
    task.global_batch_size=8 model.attention_kernel=splash_attention \
    logging_steps=1 task.max_steps=15

Profile with `assume_pure`

python3 torchprime/torch_xla_models/train.py \
    model/sharding=llama-fsdp-tp ici_mesh.fsdp=8 \
    task.global_batch_size=8 model.attention_kernel=splash_attention \
    logging_steps=1 task.max_steps=15\
    model.pure_modules=[LlamaMLP,EinsumLinear]

torchprime/torch_xla_models/model_rewriting/assume_pure.py

- Support nested tuples in `assume_pure(mark_sharding)` - Add a `PureModule` from AI-Hypercomputer/torchprime#308 - Support `PureModule(EinsumLinear)` which uses `torch.ops.xla.einsum_linear_forward`

vlasenkoalexey

thanks for adding this functionality, this is great

torchprime/sharding/shard_model.py

TESTED: python3 torchprime/torch_xla_models/train.py model.pure_modules=[LlamaMLP,Linear]

tengyifei force-pushed the yifeit/assume-pure branch 2 times, most recently from cd77464 to d577a40 Compare June 14, 2025 02:07

yaoshiang reviewed Jun 14, 2025

View reviewed changes

torchprime/torch_xla_models/model_rewriting/assume_pure.py Outdated Show resolved Hide resolved

tengyifei mentioned this pull request Jun 16, 2025

Improve assume_pure SPMD functionality pytorch/xla#9360

Merged

tengyifei force-pushed the yifeit/assume-pure branch from 574f142 to c8ba416 Compare June 17, 2025 08:21

tengyifei changed the title ~~Draft: Yifeit/assume pure~~ Support skipping tracing of selected pure modules Jun 17, 2025

tengyifei force-pushed the yifeit/assume-pure branch 3 times, most recently from 12014b1 to ef24790 Compare June 17, 2025 22:39

tengyifei marked this pull request as ready for review June 17, 2025 23:53

tengyifei requested review from yaoshiang and vlasenkoalexey June 17, 2025 23:53

vlasenkoalexey approved these changes Jun 18, 2025

View reviewed changes

torchprime/sharding/shard_model.py Show resolved Hide resolved

tengyifei added 11 commits June 18, 2025 05:45

wip

8074c58

Support configuring the pure modules

b2e9778

TESTED: python3 torchprime/torch_xla_models/train.py model.pure_modules=[LlamaMLP,Linear]

wip

d2db238

solve EinsumLinear patch

145f767

Fix EinsumLinear replacement

e40187c

update

de44d77

Add test

58ba60c

revert

224c7df

fix test

757fda4

Add E2E test

ec87c92

Fix step time bounds

961d168

tengyifei force-pushed the yifeit/assume-pure branch from 185cd7b to 961d168 Compare June 18, 2025 05:50

tengyifei merged commit 82bf6da into main Jun 18, 2025
15 checks passed

tengyifei deleted the yifeit/assume-pure branch June 18, 2025 06:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support skipping tracing of selected pure modules #308

Support skipping tracing of selected pure modules #308

Uh oh!

tengyifei commented Jun 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

vlasenkoalexey left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Support skipping tracing of selected pure modules #308

Support skipping tracing of selected pure modules #308

Uh oh!

Conversation

tengyifei commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Regular profile

Profile with assume_pure

Uh oh!

Uh oh!

vlasenkoalexey left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tengyifei commented Jun 13, 2025 •

edited

Loading

Profile with `assume_pure`